Overview

Dataset statistics

Number of variables11
Number of observations60239
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory5.1 MiB
Average record size in memory88.0 B

Variable types

NUM10
BOOL1

Reproduction

Analysis started2020-07-26 14:56:19.931251
Analysis finished2020-07-26 14:56:48.135611
Duration28.2 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 6 (< 0.1%) duplicate rows Duplicates
MBL is highly correlated with MissesHigh correlation
Misses is highly correlated with MBLHigh correlation
Virt_Memory is highly correlated with Memory_Footprint and 1 other fieldsHigh correlation
Memory_Footprint is highly correlated with Virt_Memory and 1 other fieldsHigh correlation
Res_Memory is highly correlated with Memory_Footprint and 1 other fieldsHigh correlation
LLC has 8213 (13.6%) zeros Zeros
MBL has 1494 (2.5%) zeros Zeros
Memory_Footprint has 11819 (19.6%) zeros Zeros

Variables

CPU_Utilization
Real number (ℝ≥0)

Distinct count14
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean98.50990720297483
Minimum0.0
Maximum100.0
Zeros364
Zeros (%)0.6%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile93.8
Q1100
median100
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.159073775
Coefficient of variation (CV)0.0828249057
Kurtosis128.9987813
Mean98.5099072
Median Absolute Deviation (MAD)0
Skewness-11.03090186
Sum5934138.3
Variance66.57048487
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1005189186.1%
 
93.8657910.9%
 
93.312112.0%
 
03640.6%
 
87.51520.3%
 
46.712< 0.1%
 
6.710< 0.1%
 
53.310< 0.1%
 
43.83< 0.1%
 
503< 0.1%
 
Other values (4)4< 0.1%
 
ValueCountFrequency (%) 
03640.6%
 
6.710< 0.1%
 
13.31< 0.1%
 
401< 0.1%
 
43.83< 0.1%
 
ValueCountFrequency (%) 
1005189186.1%
 
93.8657910.9%
 
93.312112.0%
 
87.51520.3%
 
86.71< 0.1%
 

Frequency
Real number (ℝ≥0)

Distinct count19958
Unique (%)33.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2873.2413205232497
Minimum999.499
Maximum3300.015
Zeros0
Zeros (%)0.0%
Memory size470.6 KiB

Quantile statistics

Minimum999.499
5-th percentile2501.7309
Q12799.999
median2800
Q32840.981
95-th percentile3300
Maximum3300.015
Range2300.516
Interquartile range (IQR)40.982

Descriptive statistics

Standard deviation238.5179861
Coefficient of variation (CV)0.08301355839
Kurtosis-0.03377921086
Mean2873.241321
Median Absolute Deviation (MAD)7.413
Skewness0.6850446526
Sum173081183.9
Variance56890.82971
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28001154719.2%
 
2799.999959015.9%
 
330055309.2%
 
3299.99952318.7%
 
25005991.0%
 
2499.9995610.9%
 
2800.00126< 0.1%
 
2800.01321< 0.1%
 
2800.01220< 0.1%
 
2800.01418< 0.1%
 
Other values (19948)2709645.0%
 
ValueCountFrequency (%) 
999.4991< 0.1%
 
999.9771< 0.1%
 
999.9931< 0.1%
 
1000.0271< 0.1%
 
1252.0271< 0.1%
 
ValueCountFrequency (%) 
3300.0151< 0.1%
 
3300.0143< 0.1%
 
3300.0011< 0.1%
 
330055309.2%
 
3299.99952318.7%
 

IPC
Real number (ℝ≥0)

Distinct count332
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6181013961055128
Minimum0.23
Maximum3.55
Zeros0
Zeros (%)0.0%
Memory size470.6 KiB

Quantile statistics

Minimum0.23
5-th percentile0.59
Q11.09
median1.48
Q32.29
95-th percentile3.23
Maximum3.55
Range3.32
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation0.7827943431
Coefficient of variation (CV)0.4837733562
Kurtosis-0.5196506141
Mean1.618101396
Median Absolute Deviation (MAD)0.67
Skewness0.5654955884
Sum97472.81
Variance0.6127669836
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.5321233.5%
 
1.5211231.9%
 
1.1910781.8%
 
1.189481.6%
 
1.158801.5%
 
1.178641.4%
 
1.168551.4%
 
1.147871.3%
 
2.287551.3%
 
2.297361.2%
 
Other values (322)5009083.2%
 
ValueCountFrequency (%) 
0.233< 0.1%
 
0.244< 0.1%
 
0.254< 0.1%
 
0.262< 0.1%
 
0.272< 0.1%
 
ValueCountFrequency (%) 
3.552< 0.1%
 
3.5416< 0.1%
 
3.53610.1%
 
3.521050.2%
 
3.511650.3%
 

Misses
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count30263
Unique (%)50.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24730.334036089578
Minimum2.0
Maximum155255.0
Zeros0
Zeros (%)0.0%
Memory size470.6 KiB

Quantile statistics

Minimum2
5-th percentile77
Q11482.5
median8966
Q332234
95-th percentile94365.1
Maximum155255
Range155253
Interquartile range (IQR)30751.5

Descriptive statistics

Standard deviation31992.6714
Coefficient of variation (CV)1.293661111
Kurtosis1.903392875
Mean24730.33404
Median Absolute Deviation (MAD)8869
Skewness1.592192476
Sum1489730592
Variance1023531023
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
82210.4%
 
91640.3%
 
111450.2%
 
101440.2%
 
71110.2%
 
131900.1%
 
12880.1%
 
121850.1%
 
124840.1%
 
129810.1%
 
Other values (30253)5902698.0%
 
ValueCountFrequency (%) 
26< 0.1%
 
310< 0.1%
 
418< 0.1%
 
515< 0.1%
 
614< 0.1%
 
ValueCountFrequency (%) 
1552551< 0.1%
 
1533011< 0.1%
 
1528641< 0.1%
 
1527861< 0.1%
 
1527431< 0.1%
 

LLC
Real number (ℝ≥0)

ZEROS

Distinct count353
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5673.0487557894385
Minimum0.0
Maximum25344.0
Zeros8213
Zeros (%)13.6%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1288
median2304
Q38856
95-th percentile21600
Maximum25344
Range25344
Interquartile range (IQR)8568

Descriptive statistics

Standard deviation7114.42777
Coefficient of variation (CV)1.25407485
Kurtosis0.4341224935
Mean5673.048756
Median Absolute Deviation (MAD)2304
Skewness1.287816861
Sum341738784
Variance50615082.49
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0821313.6%
 
7231715.3%
 
14421023.5%
 
21614832.5%
 
28811291.9%
 
3609671.6%
 
4328551.4%
 
5048101.3%
 
5767041.2%
 
7926201.0%
 
Other values (343)4018566.7%
 
ValueCountFrequency (%) 
0821313.6%
 
7231715.3%
 
14421023.5%
 
21614832.5%
 
28811291.9%
 
ValueCountFrequency (%) 
253441270.2%
 
25272930.2%
 
252001010.2%
 
25128600.1%
 
25056510.1%
 

MBL
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count27157
Unique (%)45.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1893.5500888128954
Minimum0.0
Maximum12932.9
Zeros1494
Zeros (%)2.5%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile1.6
Q186.3
median664
Q32267.3
95-th percentile6918.47
Maximum12932.9
Range12932.9
Interquartile range (IQR)2181

Descriptive statistics

Standard deviation2569.809915
Coefficient of variation (CV)1.357138599
Kurtosis2.513856625
Mean1893.550089
Median Absolute Deviation (MAD)660.6
Skewness1.724587855
Sum114065563.8
Variance6603922.997
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
014942.5%
 
0.12750.5%
 
4.91520.3%
 
0.41470.2%
 
4.41400.2%
 
4.61390.2%
 
0.61380.2%
 
3.91380.2%
 
4.11340.2%
 
0.81320.2%
 
Other values (27147)5735095.2%
 
ValueCountFrequency (%) 
014942.5%
 
0.12750.5%
 
0.21240.2%
 
0.3910.2%
 
0.41470.2%
 
ValueCountFrequency (%) 
12932.91< 0.1%
 
12918.41< 0.1%
 
12917.91< 0.1%
 
12911.71< 0.1%
 
12909.31< 0.1%
 

Memory_Footprint
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.17827985192317267
Minimum0.0
Maximum0.6
Zeros11819
Zeros (%)19.6%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.1
median0.2
Q30.3
95-th percentile0.4
Maximum0.6
Range0.6
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.1325844162
Coefficient of variation (CV)0.7436870446
Kurtosis-0.9342883927
Mean0.1782798519
Median Absolute Deviation (MAD)0.1
Skewness0.2809600172
Sum10739.4
Variance0.01757862742
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.11733928.8%
 
0.31399423.2%
 
01181919.6%
 
0.21050417.4%
 
0.459139.8%
 
0.56071.0%
 
0.6630.1%
 
ValueCountFrequency (%) 
01181919.6%
 
0.11733928.8%
 
0.21050417.4%
 
0.31399423.2%
 
0.459139.8%
 
ValueCountFrequency (%) 
0.6630.1%
 
0.56071.0%
 
0.459139.8%
 
0.31399423.2%
 
0.21050417.4%
 

Virt_Memory
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count4273
Unique (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean444226.66073473997
Minimum0.0
Maximum1412276.0
Zeros10
Zeros (%)< 0.1%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile16936
Q1191524
median426208
Q3814764
95-th percentile881988
Maximum1412276
Range1412276
Interquartile range (IQR)623240

Descriptive statistics

Standard deviation319442.6232
Coefficient of variation (CV)0.7190982699
Kurtosis-1.157182343
Mean444226.6607
Median Absolute Deviation (MAD)267400
Skewness0.2632214119
Sum2.675976982e+10
Variance1.020435895e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1693655409.2%
 
81907645087.5%
 
62951642387.0%
 
42620837696.3%
 
50431228424.7%
 
81476428334.7%
 
24170425884.3%
 
85569616502.7%
 
87668413792.3%
 
24170013292.2%
 
Other values (4263)2956349.1%
 
ValueCountFrequency (%) 
010< 0.1%
 
159204940.8%
 
1693655409.2%
 
1973211< 0.1%
 
1978011< 0.1%
 
ValueCountFrequency (%) 
14122761< 0.1%
 
14110081< 0.1%
 
14089401< 0.1%
 
14075321< 0.1%
 
14072601< 0.1%
 

Res_Memory
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10088
Unique (%)16.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean425991.85309185076
Minimum0.0
Maximum1363148.8
Zeros10
Zeros (%)< 0.1%
Memory size470.6 KiB

Quantile statistics

Minimum0
5-th percentile6784
Q1163464
median420288
Q3780172
95-th percentile869004
Maximum1363148.8
Range1363148.8
Interquartile range (IQR)616708

Descriptive statistics

Standard deviation320825.3607
Coefficient of variation (CV)0.7531255782
Kurtosis-1.169293808
Mean425991.8531
Median Absolute Deviation (MAD)271028
Skewness0.282313574
Sum2.566132324e+10
Variance1.02928912e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
67887221.2%
 
6237484970.8%
 
4203524210.7%
 
10485763900.6%
 
6237403770.6%
 
1153433.63650.6%
 
4203203620.6%
 
68283600.6%
 
6237283490.6%
 
68723310.5%
 
Other values (10078)5606593.1%
 
ValueCountFrequency (%) 
010< 0.1%
 
56818< 0.1%
 
57618< 0.1%
 
58829< 0.1%
 
60418< 0.1%
 
ValueCountFrequency (%) 
1363148.8900.1%
 
1258291.23120.5%
 
1153433.63650.6%
 
10485763900.6%
 
1023692.81< 0.1%
 

Allocated_Cache
Real number (ℝ)

Distinct count63
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.150844967545942
Minimum-103.5
Maximum81.0
Zeros92
Zeros (%)0.2%
Memory size470.6 KiB

Quantile statistics

Minimum-103.5
5-th percentile2.25
Q12.25
median4.5
Q36.75
95-th percentile18
Maximum81
Range184.5
Interquartile range (IQR)4.5

Descriptive statistics

Standard deviation6.657130524
Coefficient of variation (CV)1.292434652
Kurtosis40.08395749
Mean5.150844968
Median Absolute Deviation (MAD)2.25
Skewness-1.242998527
Sum310281.75
Variance44.31738682
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.252777446.1%
 
4.51503125.0%
 
6.75688711.4%
 
933535.6%
 
11.2515302.5%
 
13.511641.9%
 
15.757701.3%
 
187411.2%
 
20.256981.2%
 
24.756831.1%
 
Other values (53)16082.7%
 
ValueCountFrequency (%) 
-103.52< 0.1%
 
-992< 0.1%
 
-87.758< 0.1%
 
-83.256< 0.1%
 
-78.754< 0.1%
 
ValueCountFrequency (%) 
812< 0.1%
 
65.256< 0.1%
 
638< 0.1%
 
60.758< 0.1%
 
58.52< 0.1%
 

ST
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size470.6 KiB
0
34243
1
25996
ValueCountFrequency (%) 
03424356.8%
 
12599643.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

CPU_UtilizationFrequencyIPCMissesLLCMBLMemory_FootprintVirt_MemoryRes_MemoryAllocated_CacheST
0100.02800.0002.315651.01944.0368.10.1202752.0186688.02.250
1100.02802.4602.335635.02304.0364.80.1202884.0186952.02.250
2100.02802.4182.375367.02664.0328.40.1203036.0186952.02.250
3100.02802.4962.395008.02520.0319.30.1203180.0187216.02.250
4100.02800.0002.345478.02880.0361.20.1203180.0187216.02.250
593.82800.0002.405062.02520.0330.00.1203336.0187480.02.250
6100.02800.0002.345540.02376.0385.60.1203644.0187744.02.250
7100.02802.4332.295978.02808.0422.70.1203644.0188008.02.250
8100.02802.4162.325689.02304.0411.10.1203644.0188008.02.250
9100.02802.4342.454650.02304.0311.60.1203804.0188008.02.250

Last rows

CPU_UtilizationFrequencyIPCMissesLLCMBLMemory_FootprintVirt_MemoryRes_MemoryAllocated_CacheST
60229100.02872.8761.337392.0504.0526.80.2504312.0492524.02.250
6023093.82871.0791.427977.0504.0671.10.2504312.0492524.02.250
6023193.32845.7611.437612.0576.0687.30.2504312.0492524.02.250
60232100.02849.9841.398160.0288.0702.10.2504312.0492524.02.250
60233100.02855.1641.387907.0576.0556.70.2504312.0492524.02.250
60234100.02885.3741.418166.0792.0650.80.2504312.0492524.02.250
60235100.02864.6431.408043.0216.0718.40.2504312.0492524.02.250
60236100.02841.7781.397505.0936.0657.80.2504312.0492524.02.250
60237100.02845.0181.448357.0216.0738.00.2504312.0492524.02.250
60238100.02848.7521.436840.02232.0545.10.2504312.0492524.02.250

Duplicate rows

Most frequent

CPU_UtilizationFrequencyIPCMissesLLCMBLMemory_FootprintVirt_MemoryRes_MemoryAllocated_CacheSTcount
0100.02799.9992.3896.00.02.50.016936.06716.04.5012
1100.02800.0002.30113.00.06.30.016936.06840.02.2512
2100.02800.0002.37128.00.04.80.016936.06744.04.5012
3100.02800.0002.4010.02376.00.00.016936.06828.011.2512
4100.02800.0002.6743.00.02.70.016936.06744.04.5012
5100.02800.0002.7011.01152.00.10.016936.06832.015.7512